Memory interleaving coordinated by networked processing units

ABSTRACT

Various approaches for configuring interleaving in a memory pool used in an edge computing arrangement, including with the use of infrastructure processing units (IPUs) and similar networked processing units, are disclosed. An example system may discover and map disaggregated memory resources at respective compute locations connected to each another via at least one interconnect. The system may identify workload requirements for use of the compute locations by respective workloads, for workloads provided by client devices to the compute locations. The system may determine an interleaving arrangement for a memory pool that fulfills the workload requirements, to use the interleaving arrangement to distribute data for the respective workloads among the disaggregated memory resources. The system may configure the memory pool for use by the client devices of the network, as the memory pool causes the disaggregated memory resources to host data based on the interleaving arrangement.

PRIORITY CLAIM

This application claims the benefit of priority to U.S. ProvisionalPatent Application No. 63/425,857, filed Nov. 16, 2022, and titled“COORDINATION OF DISTRIBUTED NETWORKED PROCESSING UNITS”, which isincorporated herein by reference in its entirety.

TECHNICAL FIELD

Embodiments described herein generally relate to data processing, memoryusage, network communication, and communication system implementationsof distributed computing, including the implementations with the use ofnetworked processing units such as infrastructure processing units(IPUs) or data processing units (DPUs).

BACKGROUND

System architectures are moving to highly distributed multi-edge andmulti-tenant deployments. Deployments may have different limitations interms of power and space. Deployments also may use different types ofcompute, acceleration, and storage technologies in order to overcomethese power and space limitations. Deployments also are typicallyinterconnected in tiered and/or peer-to-peer fashion, in an attempt tocreate a network of connected devices and edge appliances that worktogether.

Edge computing, at a general level, has been described as systems thatprovide the transition of compute and storage resources closer toendpoint devices at the edge of a network (e.g., consumer computingdevices, user equipment, etc.). As compute and storage resources aremoved closer to endpoint devices, a variety of advantages have beenpromised such as reduced application latency, improved servicecapabilities, improved compliance with security or data privacyrequirements, improved backhaul bandwidth, improved energy consumption,and reduced cost. However, many deployments of edge computingtechnologies—especially complex deployments for use by multipletenants—have not been fully adopted.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, which are not necessarily drawn to scale, like numeralsmay describe similar components in different views. Like numerals havingdifferent letter suffixes may represent different instances of similarcomponents. Some embodiments are illustrated by way of example, and notlimitation, in the figures of the accompanying drawings in which:

FIG. 1 illustrates an overview of a distributed edge computingenvironment, according to an example;

FIG. 2 depicts computing hardware provided among respective deploymenttiers in a distributed edge computing environment, according to anexample;

FIG. 3 depicts additional characteristics of respective deploymentstiers in a distributed edge computing environment, according to anexample;

FIG. 4 depicts a computing system architecture including a computeplatform and a network processing platform provided by an infrastructureprocessing unit, according to an example;

FIG. 5 depicts an infrastructure processing unit arrangement operatingas a distributed network processing platform within network and datacenter edge settings, according to an example;

FIG. 6 depicts functional components of an infrastructure processingunit and related services, according to an example;

FIG. 7 depicts a block diagram of example components in an edgecomputing system which implements a distributed network processingplatform, according to an example;

FIG. 8 depicts a scenario of computing operations coordinated among anedge layer to establish use of a memory pool, according to an example;

FIG. 9 depicts a network architecture for establishing interleaving in amemory pool established among compute resources and memory resources ofrespective base stations, according to an example;

FIG. 10 depicts a scenario for deploying interleaving in a memory poolbased on coordination of networked processing units, according to anexample; and

FIG. 11 depicts a flowchart of a method for configuring interleaving ina memory pool established in an edge computing environment, according toan example.

DETAILED DESCRIPTION

Various approaches for memory pooling in an edge computing setting arediscussed herein. Existing approaches for memory pooling are not able todynamically and effectively allocate (and re-allocate) resources andmemory regions in edge computing settings. The following applies theconcept of interleaving to allow the distributed storage and retrievalof data among disaggregated memory locations in an edge layer. Thefollowing also applies estimation and prediction techniques to ensurethat memory requests in the edge layer can be properly fulfilled fromthe distributed memory pool.

These approaches introduce a number of interfaces and logic to enablememory resources to be discovered and allocated into a pool in a highlydistributed/disaggregated environment. The interfaces and logic, invarious examples, perform estimation and prediction based on telemetryand network conditions. Telemetry can be continuously collected andanalyzed to obtain accurate information on network conditions at anygiven time. Further, the memory resources can connect to accelerator orother compute capabilities, using compute express link (CXL) and otherhigh-speed interconnect technologies. This provides a particular benefitfor settings such as when operating base stations that use hot-pluggableaccelerators or compute equipment.

The disclosed mechanisms provide a simplified abstraction for memorypooling, including determining how and when to use interleaving and howto improve the overall latency of memory storage and retrieval. As aresult, memory operations can be parallelized and significantly sped up.Further, the following approaches are adaptable to a variety of usecases, including the use of “tiers” to service memory requests andworkloads associated with a particular service requirement. As will beunderstood, in an edge computing setting, different workloads will havedifferent requirements, in terms of latency, with or without affectingthe service level agreements (SLAs). Here, by organizing memoryscheduling and pooling, orchestration and satisfying service agreementrequirements can be more effectively accomplished.

In various examples, the logic that is used to configure the memorypooling and interleaving is managed by a network switch or othernetwork-based component. For instance, a network switch can evaluatetelemetry information to ensure that data across memory resources can beaccessed with less latency—even during variations in use cases. Thenetwork switch can select different memory elements (and portions ofdifferent memory elements) in a pool, based on a memory bandwidthrequired, or by the use of tiers (e.g., high/medium/low). The networkswitch may also select and reserve fabric resources (e.g., memorybandwidth) for connecting to servers, and directly performreading/writing to memory ranges.

Accordingly, the following describes coordinated, intelligent componentsto configure a right combination of memory and compute resources forservicing client workloads and increasing speed. While many of thetechniques may be implemented by a switch, orchestrator, or controller,the techniques are also suited for use by networked processing unitssuch as infrastructure processing units (IPUs, such as respective IPUsoperating as a memory owner and remote memory consumer).

Additional implementation details of the memory pool interleavingtechniques in an edge computing network, effected via a network switchor IPUs are provided among provided in FIGS. 8 to 11 , below. Generalimplementation details of an edge computing network and the use ofdistributed networked processing units in such a network is provided inFIGS. 1 to 7 , below.

Distributed Edge Computing and Networked Processing Units

FIG. 1 is a block diagram 100 showing an overview of a distributed edgecomputing environment, which may be adapted for implementing the presenttechniques for distributed networked processing units. As shown, theedge cloud 110 is established from processing operations among one ormore edge locations, such as a satellite vehicle 141, a base station142, a network access point 143, an on premise server 144, a networkgateway 145, or similar networked devices and equipment instances. Theseprocessing operations may be coordinated by one or more edge computingplatforms 120 or systems that operate networked processing units (e.g.,IPUs, DPUs) as discussed herein.

The edge cloud 110 is generally defined as involving compute that islocated closer to endpoints 160 (e.g., consumer and producer datasources) than the cloud 130, such as autonomous vehicles 161, userequipment 162, business and industrial equipment 163, video capturedevices 164, drones 165, smart cities and building devices 166, sensorsand IoT devices 167, etc. Compute, memory, network, and storageresources that are offered at the entities in the edge cloud 110 canprovide ultra-low or improved latency response times for services andfunctions used by the endpoint data sources as well as reduce networkbackhaul traffic from the edge cloud 110 toward cloud 130 thus improvingenergy consumption and overall network usages among other benefits.

Compute, memory, and storage are scarce resources, and generallydecrease depending on the edge location (e.g., fewer processingresources being available at consumer end point devices than at a basestation or a central office data center). As a general design principle,edge computing attempts to minimize the number of resources needed fornetwork services, through the distribution of more resources that arelocated closer both geographically and in terms of in-network accesstime.

FIG. 2 depicts examples of computing hardware provided among respectivedeployment tiers in a distributed edge computing environment. Here, onetier at an on-premise edge system is an intelligent sensor or gatewaytier 210, which operates network devices with low power and entry-levelprocessors and low-power accelerators. Another tier at an on-premiseedge system is an intelligent edge tier 220, which operates edge nodeswith higher power limitations and may include a high-performancestorage.

Further in the network, a network edge tier 230 operates serversincluding form factors optimized for extreme conditions (e.g.,outdoors). A data center edge tier 240 operates additional types of edgenodes such as servers, and includes increasingly powerful or capablehardware and storage technologies. Still further in the network, a coredata center tier 250 and a public cloud tier 260 operate computeequipment with the highest power consumption and largest configurationof processors, acceleration, storage/memory devices, and highestthroughput network.

In each of these tiers, various forms of Intel® processor lines aredepicted for purposes of illustration; it will be understood that otherbrands and manufacturers of hardware will be used in real-worlddeployments. Additionally, it will be understood that additionalfeatures or functions may exist among multiple tiers. One such exampleis connectivity and infrastructure management that enable a distributedIPU architecture, that can potentially extend across all of tiers 210,220, 230, 240, 250, 260. Other relevant functions that may extend acrossmultiple tiers may relate to security features, domain or groupfunctions, and the like.

FIG. 3 depicts additional characteristics of respective deployment tiersin a distributed edge computing environment, based on the tiersdiscussed with reference to FIG. 2 . This figure depicts additionalnetwork latencies at each of the tiers 210, 220, 230, 240, 250, 260, andthe gradual increase in latency in the network as the compute is locatedat a longer distance from the edge endpoints. Additionally, this figuredepicts additional power and form factor constraints, use cases, and keyperformance indicators (KPIs).

With these variations and service features in mind, edge computingwithin the edge cloud 110 may provide the ability to serve and respondto multiple applications of the use cases in real-time or near real-timeand meet ultra-low latency requirements. As systems have becomehighly-distributed, networking has become one of the fundamental piecesof the architecture that allow achieving scale with resiliency,security, and reliability. Networking technologies have evolved toprovide more capabilities beyond pure network routing capabilities,including to coordinate quality of service, security, multi-tenancy, andthe like. This has also been accelerated by the development of new smartnetwork adapter cards and other type of network derivatives thatincorporated capabilities such as ASICs (application-specific integratedcircuits) or FPGAs (field programmable gate arrays) to accelerate someof those functionalities (e.g., remote attestation).

In these contexts, networked processing units have begun to be deployedat network cards (e.g., smart NICs), gateways, and the like, which allowdirect processing of network workloads and operations. One example of anetworked processing unit is an infrastructure processing unit (IPU),which is a programmable network device that can be extended to providecompute capabilities with far richer functionalities beyond purenetworking functions. Another example of a network processing unit is adata processing unit (DPU), which offers programmable hardware forperforming infrastructure and network processing operations. Thefollowing discussion refers to functionality applicable to an IPUconfiguration, such as that provided by an Intel® line of IPUprocessors. However, it will be understood that functionality will beequally applicable to DPUs and other types of networked processing unitsprovided by ARM®, Nvidia®, and other hardware OEMs.

FIG. 4 depicts an example compute system architecture that includes acompute platform 420 and a network processing platform comprising an IPU410. This architecture—and in particular the IPU 410—can be managed,coordinated, and orchestrated by the functionality discussed below,including with the functions described with reference to FIG. 6 .

The main compute platform 420 is composed by typical elements that areincluded with a computing node, such as one or more CPUs 424 that may ormay not be connected via a coherent domain (e.g., via Ultra PathInterconnect (UPI) or another processor interconnect); one or morememory units 425; one or more additional discrete devices 426 such asstorage devices, discrete acceleration cards (e.g., a field-programmablegate array (FPGA), a visual processing unit (VPU), etc.); a baseboardmanagement controller 421; and the like. The compute platform 420 mayoperate one or more containers 422 (e.g., with one or moremicroservices), within a container runtime 423 (e.g., Dockercontainerd). The IPU 410 operates as a networking interface and isconnected to the compute platform 420 using an interconnect (e.g., usingeither PCIe or CXL). The IPU 410, in this context, can be observed asanother small compute device that has its own: (1) Processing cores(e.g., provided by low-power cores 417), (2) operating system (OS) andcloud native platform 414 to operate one or more containers 415 and acontainer runtime 416; (3) Acceleration functions provided by an ASIC411 or FPGA 412; (4) Memory 418; (5) Network functions provided bynetwork circuitry 413; etc.

From a system design perspective, this arrangement provides importantfunctionality. The IPU 410 is seen as a discrete device from the localhost (e.g., the OS running in the compute platform CPUs 424) that isavailable to provide certain functionalities (networking, accelerationetc.). Those functionalities are typically provided via Physical orVirtual PCIe functions. Additionally, the IPU 410 is seen as a host(with its own IP etc.) that can be accessed by the infrastructure tosetup an OS, run services, and the like. The IPU 410 sees all thetraffic going to the compute platform 420 and can perform actions—suchas intercepting the data or performing some transformation—as long asthe correct security credentials are hosted to decrypt the traffic.Traffic going through the IPU goes to all the layers of the Open SystemsInterconnection model (OSI model) stack (e.g., from physical toapplication layer). Depending on the features that the IPU has,processing may be performed at the transport layer only. However, if theIPU has capabilities to perform traffic intercept, then the IPU also maybe able to intercept traffic at the traffic layer (e.g., intercept CDNtraffic and process it locally).

Some of the use cases being proposed for IPUs and similar networkedprocessing units include: to accelerate network processing; to managehosts (e.g., in a data center); or to implement quality of servicepolicies. However, most of functionalities today are focused at usingthe IPU at the local appliance level and within a single system. Theseapproaches do not address how the IPUs could work together in adistributed fashion or how system functionalities can be divided amongthe IPUs on other parts of the system. Accordingly, the followingintroduces enhanced approaches for enabling and controlling distributedfunctionality among multiple networked processing units. This enablesthe extension of current IPU functionalities to work as a distributedset of IPUs that can work together to achieve stronger features such as,resiliency, reliability, etc.

Distributed Architectures of IPUs

FIG. 5 depicts an IPU arrangement operating as a distributed networkprocessing platform within network and data center edge settings. In afirst deployment model of a computing environment 510, workloads orprocessing requests are directly provided to an IPU platform, such asdirectly to IPU 514. In a second deployment model of the computingenvironment 510, workloads or processing requests are provided to someintermediate processing device 512, such as a gateway or NUC (next unitof computing) device form factor, and the intermediate processing device512 forwards the workloads or processing requests to the IPU 514. Itwill be understood that a variety of other deployment models involvingthe composability and coordination of one or more IPUs, compute units,network devices, and other hardware may be provided.

With the first deployment model, the IPU 514 directly receives data fromuse cases 502A. The IPU 514 operates one or more containers withmicroservices to perform processing of the data. As an example, a smallgateway (e.g., a NUC type of appliance) may connect multiple cameras toan edge system that is managed or connected by the IPU 514. The IPU 514may process data as a small aggregator of sensors that runs on the faredge, or may perform some level of inline or preprocessing and thatsends payload to be further processed by the IPU or the system that theIPU connects.

With the second deployment model, the intermediate processing device 512provided by the gateway or NUC receives data from use cases 502B. Theintermediate processing device 512 includes various processing elements(e.g., CPU cores, GPUs), and may operate one or more microservices forservicing workloads from the use cases 502B. However, the intermediateprocessing device 512 invokes the IPU 514 to complete processing of thedata.

In either the first or the second deployment model, the IPU 514 mayconnect with a local compute platform, such as that provided by a CPU516 (e.g., Intel® Xeon CPU) operating multiple microservices. The IPUmay also connect with a remote compute platform, such as that providedat a data center by CPU 540 at a remote server. As an example, considera microservice that performs some analytical processing (e.g., facedetection on image data), where the CPU 516 and the CPU 540 provideaccess to this same microservice. The IPU 514, depending on the currentload of the CPU 516 and the CPU 540, may decide to forward the images orpayload to one of the two CPUs. Data forwarding or processing can alsodepend on other factors such as SLA for latency or performance metrics(e.g., perf/watt) in the two systems. As a result, the distributed IPUarchitecture may accomplish features of load balancing.

The IPU in the computing environment 510 may be coordinated with othernetwork-connected IPUs. In an example, a Service and Infrastructureorchestration manager 530 may use multiple IPUs as a mechanism toimplement advanced service processing schemes for the user stacks. Thismay also enable implementing of system functionalities such as failover,load balancing etc.

In a distributed architecture example, IPUs can be arranged in thefollowing non-limiting configurations. As a first configuration, aparticular IPU (e.g., IPU 514) can work with other IPUs (e.g., IPU 520)to implement failover mechanisms. For example, an IPU can be configuredto forward traffic to service replicas that runs on other systems when alocal host does not respond.

As a second configuration, a particular IPU (e.g., IPU 514) can workwith other IPUs (e.g., IPU 520) to perform load balancing across othersystems. For example, consider a scenario where CDN traffic targeted tothe local host is forwarded to another host in case that I/O or computein the local host is scarce at a given moment.

As a third configuration, a particular IPU (e.g., IPU 514) can work as apower management entity to implement advanced system policies. Forexample, consider a scenario where the whole system (e.g., including CPU516) is placed in a C6 state (a low-power/power-down state available toa processor) while forwarding traffic to other systems (e.g., IPU 520)and consolidating it.

As will be understood, fully coordinating a distributed IPU architecturerequires numerous aspects of coordination and orchestration. Thefollowing examples of system architecture deployments provide discussionof how edge computing systems may be adapted to include coordinatedIPUs, and how such deployments can be orchestrated to use IPUs atmultiple locations to expand to the new envisioned functionality.

Distributed IPU Functionality

An arrangement of distributed IPUs offers a set of new functionalitiesto enable IPUs to be service focused. FIG. 6 depicts functionalcomponents of an IPU 610, including services and features to implementthe distributed functionality discussed herein. It will be understoodthat some or all of the functional components provided in FIG. 6 may bedistributed among multiple IPUs, hardware components, or platforms,depending on the particular configuration and use case involved.

In the block diagram of FIG. 6 , a number of functional components areoperated to manage requests for a service running in the IPU (or runningin the local host). As discussed above, IPUs can either run services orintercept requests arriving to services running in the local host andperform some action. In the latter case, the IPU can perform thefollowing types of actions/functions (provided as a non-limitingexamples).

Peer Discovery. In an example, each IPU is provided with Peer Discoverylogic to discover other IPUs in the distributed system that can worktogether with it. Peer Discovery logic may use mechanisms such asbroadcasting to discover other IPUs that are available on a network. ThePeer Discovery logic is also responsible to work with the PeerAttestation and Authentication logic to validate and authenticate thepeer IPU's identity, determine whether they are trustworthy, and whetherthe current system tenant allows the current IPU to work with them. Toaccomplish this, an IPU may perform operations such as: retrieve a proofof identity and proof of attestation; connect to a trusted servicerunning in a trusted server; or, validate that the discovered system istrustworthy. Various technologies (including hardware components orstandardized software implementations) that enable attestation,authentication, and security may be used with such operations.

Peer Attestation. In an example, each IPU provides interfaces to otherIPUs to enable attestation of the IPU itself. IPU Attestation logic isused to perform an attestation flow within a local IPU in order tocreate the proof of identity that will be shared with other IPUs.Attestation here may integrate previous approaches and technologies toattest a compute platform. This may also involve the use of trustedattestation service 640 to perform the attestation operations.

Functionality Discovery. In an example, a particular IPU includescapabilities to discover the functionalities that peer IPUs provide.Once the authentication is done, the IPU can determine whatfunctionalities that the peer IPUs provide (using the IPU Peer DiscoveryLogic) and store a record of such functionality locally. Examples ofproperties to discover can include: (i) Type of IPU and functionalitiesprovided and associated KPIs (e.g. performance/watt, cost etc.); (ii)Available functionalities as well as possible functionalities to executeunder secure enclaves (e.g., enclaves provided by Intel® SGX or TDXtechnologies); (iii) Current services that are running on the IPU and onthe system that can potentially accept requests forwarded from this IPU;or (iv) Other interfaces or hooks that are provided by an IPU, such as:Access to remote storage; Access to a remote VPU; Access to certainfunctions. In a specific example, service may be described by propertiessuch as: UUID; Estimated performance KPIs in the host or IPU; Averageperformance provided by the system during the N units of time (or anyother type of indicator); and like properties.

Service Management. The IPU includes functionality to manage servicesthat are running either on the host compute platform or in the IPUitself. Managing (orchestration) services includes performance serviceand resource orchestration for the services that can run on the IPU orthat the IPU can affect. Two type of usage models are envisioned:

External Orchestration Coordination. The IPU may enable externalorchestrators to deploy services on the IPU compute capabilities. To doso, an IPU includes a component similar to K8 compatible APIs to managethe containers (services) that run on the IPU itself. For example, theIPU may run a service that is just providing content to storageconnected to the platform. In this case, the orchestration entityrunning in the IPU may manage the services running in the IPU as ithappens in other systems (e.g. keeping the service level objectives).

Further, external orchestrators can be allowed to register to the IPUthat services are running on the host may require to broker requests,implement failover mechanisms and other functionalities. For example, anexternal orchestrator may register that a particular service running onthe local compute platform is replicated in another edge node managed byanother IPU where requests can be forwarded.

In this later use case external orchestrators may provide to theService/Application Intercept logic the inputs that are needed tointercept traffic for these services (as typically is encrypted). Thismay include properties such as a source and destination traffic of thetraffic to be intercepted, or the key to use to decrypt the traffic.Likewise, this may be needed to terminate TLS to understand the requeststhat arrive to the IPU and that the other logics may need to parse totake actions. For example, if there is a CDN read request the IPU mayneed to decrypt the packet to understand that network packet includes aread request and may redirect it to another host based on the contentthat is being intercepted. Examples of Service/Application Interceptinformation is depicted in table 620 in FIG. 6 .

External Orchestration Implementation. External orchestration can beimplemented in multiple topologies. One supported topology includeshaving the orchestrator managing all the IPUs running on the backendpublic or private cloud. Another supported topology includes having theorchestrator managing all the IPUs running in a centralized edgeappliance. Still another supported topology includes having theorchestrator running in another IPU that is working as the controller orhaving the orchestrator running distributed in multiple other IPUs thatare working as controllers (master/primary node), or in a hierarchicalarrangement.

Functionality for Broker requests. The IPU may include Service RequestBrokering logic and Load Balancing logic to perform brokering actions onarrival for requests of target services running in the local system. Forinstance, the IPU may decide to see if those requests can be executed byother peer systems (e.g., accessible through Service and InfrastructureOrchestration 630). This can be caused, for example, because load in thelocal systems is high. The local IPU may negotiate with other peer IPUsfor the possibility to forward the request. Negotiation may involvemetrics such as cost. Based on such negotiation metrics, the IPU maydecide to forward the request.

Functionality for Load Balancing requests. The Service Request Brokeringand Load Balancing logic may distribute requests arriving to the localIPU to other peer IPUs. In this case, the other IPUs and the local IPUwork together and do not necessarily need brokering. Such logic actssimilar to a cloud native sidecar proxy. For instance, requests arrivingto the system may be sent to the service X running in the local system(either IPU or compute platform) or forwarded to a peer IPU that hasanother instance of service X running. The load balancing distributioncan be based on existing algorithms such as based on the systems thathave lower load, using round robin, etc.

Functionality for failover, resiliency and reliability. The IPU includesReliability and Failover logic to monitor the status of the servicesrunning on the compute platform or the status of the compute platformitself. The Reliability and Failover logic may require the LoadBalancing logic to transiently or permanently forward requests that aimspecific services in situations such as where: i) The compute platformis not responding; ii) The service running inside the compute node isnot responding; and iii) The compute platform load prevents the targetedservice to provide the right level of service level objectives (SLOs).Note that the logic must know the required SLOs for the services. Suchfunctionality may be coordinated with service information 650 includingSLO information.

Functionality for executing parts of the workloads. Use cases such asvideo analytics tend to be decomposed in different microservices thatconform a pipeline of actions that can be used together. The IPU mayinclude a workload pipeline execution logic that understands howworkloads are composed and manage their execution. Workloads can bedefined as a graph that connects different microservices. The loadbalancing and brokering logic may be able to understand those graphs anddecide what parts of the pipeline are executed where. Further, toperform these and other operations, Intercept logic will also decodewhat requests are included as part of the requests.

Resource Management

A distributed network processing configuration may enable IPUs toperform important role for managing resources of edge appliances. Asfurther shown in FIG. 6 , the functional components of an IPU canoperate to perform these and similar types of resource managementfunctionalities.

As a first example, an IPU can provide management or access to externalresources that are hosted in other locations and expose them as localresources using constructs such as Compute Express Link (CXL). Forexample, the IPU could potentially provide access to a remoteaccelerator that is hosted in a remote system via CXL.mem/cache and IO.Another example includes providing access to remote storage devicehosted in another system. In this later case the local IPU could workwith another IPU in the storage system and expose the remote system asPCIE VF/PF (virtual functions/physical functions) to the local host.

As a second example, an IPU can provide access to IPU-specificresources. Those IPU resource may be physical (such as storage ormemory) or virtual (such as a service that provides access to randomnumber generation).

As a third example, an IPU can manage local resources that are hosted inthe system where it belongs. For example, the IPU can manage power ofthe local compute platform.

As a fourth example, an IPU can provide access to other type of elementsthat relate to resources (such as telemetry or other types of data). Inparticular, telemetry provides useful data for something that is neededto decide where to execute things or to identify problems.

I/O Management. Because the IPU is acting as a connection proxy betweenthe external peers (compute systems, remote storage etc.) resources andthe local compute, the IPU can also include functionality to manage I/Ofrom the system perspective.

Host Virtualization and XPU Pooling. The IPU includes HostVirtualization and XPU Pooling logic responsible to manage the access toresources that are outside the system domain (or within the IPU) andthat can be offered to the local compute system. Here, “XPU” refers toany type of a processing unit, whether CPU, GPU, VPU, an accelerationprocessing unit, etc. The IPU logic, after discovery and attestation,can agree with other systems to share external resources with theservices running in the local system. IPUs may advertise to other peersavailable resources or can be discovered during discovery phase asintroduced earlier. IPUs may request to other IPUS to those resources.For example, an IPU on system A may request access to storage on systemB manage by another IPU. Remote and local IPUs can work together toestablish a connection between the target resources and the localsystem.

Once the connection and resource mapping is completed, resources can beexposed to the services running in the local compute node using theVF/PF PCIE and CXL Logic. Each of those resources can be offered asVF/PF. The IPU logic can expose to the local host resources that arehosted in the IPU. Examples of resources to expose may include localaccelerators, access to services, and the like.

Power Management. Power management is one of the key features to achievefavorable system operational expenditures (OPEXs). IPU is very wellpositioned to optimize power consumption that the local system isconsuming. The Distributed and local power management unit: Isresponsible to meter the power that the system is consuming, the loadthat the system is receiving and track the service level agreements thatthe various services running in the system are achieving for thearriving requests. Likewise, when power efficiencies (e.g., power usageeffectiveness (PUE)) are not achieving certain thresholds or the localcompute demand is low, the IPU may decide to forward the requests tolocal services to other IPUs that host replicas of the services. Suchpower management features may also coordinate with the Brokering andLoad Balancing logic discussed above. As will be understood, IPUs canwork together to decide where requests can be consolidated to establishhigher power efficiency as system. When traffic is redirected, the localpower consumption can be reduced in different ways. Example operationsthat can be performed include: changing the system to C6 State; changingthe base frequencies; performing other adaptations of the system orsystem components.

Telemetry Metrics. The IPU can generate multiple types of metrics thatcan be interesting from services, orchestration or tenants owning thesystem. In various examples, telemetry can be accessed, including: (i)Out of band via side interfaces; (ii) In band by services running in theIPU; or (iii) Out of band using PCIE or CXL from the host perspective.Relevant types of telemetries can include: Platform telemetry; ServiceTelemetry; IPU telemetry; Traffic telemetry; and the like.

System Configurations for Distributed Processing

Further to the examples noted above, the following configurations may beused for processing with distributed IPUs:

1) Local IPUs connected to a compute platform by an interconnect (e.g.,as shown in the configuration of FIG. 4 );

2) Shared IPUs hosted within a rack/physical network—such as in avirtual slice or multi-tenant implementation of IPUs connected viaCXL/PCI-E (local), or extension via Ethernet/Fiber for nodes within acluster;

3) Remote IPUs accessed via an IP Network, such as within certainlatency for data plane offload/storage offloads (or, connected formanagement/control plane operations); or

4) Distributed IPUs providing an interconnected network of IPUs,including as many as hundreds of nodes within a domain.

Configurations of distributed IPUs working together may also includefragmented distributed IPUs, where each IPU or pooled system providespart of the functionalities, and each IPU becomes a malleable system.Configurations of distributed IPUs may also include virtualized IPUs,such as provided by a gateway, switch, or an inline component (e.g.,inline between the service acting as IPU), and in some examples, inscenarios where the system has no IPU.

Other deployment models for IPUs may include IPU-to-IPU in the same tieror a close tier; IPU-to-IPU in the cloud (data to compute versus computeto data); integration in small device form factors (e.g., gateway IPUs);gateway/NUC+IPU which connects to a data center; multiple GW/NUC (e.g.16) which connect to one IPU (e.g. switch); gateway/NUC+IPU on theserver; and GW/NUC and IPU that are connected to a server with an IPU.

The preceding distributed IPU functionality may be implemented among avariety of types of computing architectures, including one or moregateway nodes, one or more aggregation nodes, or edge or core datacenters distributed across layers of the network (e.g., in thearrangements depicted in FIGS. 2 and 3 ). Accordingly, such IPUarrangements may be implemented in an edge computing system by or onbehalf of a telecommunication service provider (“telco”, or “TSP”),internet-of-things service provider, cloud service provider (CSP),enterprise entity, or any other number of entities. Variousimplementations and configurations of the edge computing system may beprovided dynamically, such as when orchestrated to meet serviceobjectives. Such edge computing systems may be embodied as a type ofdevice, appliance, computer, or other “thing” capable of communicatingwith other edge, networking, or endpoint components.

FIG. 7 depicts a block diagram of example components in a computingdevice 750 which can operate as a distributed network processingplatform. The computing device 750 may include any combinations of thecomponents referenced above, implemented as integrated circuits (ICs),as a package or system-on-chip (SoC), or as portions thereof, discreteelectronic devices, or other modules, logic, instruction sets,programmable logic or algorithms, hardware, hardware accelerators,software, firmware, or a combination thereof adapted in the computingdevice 750, or as components otherwise incorporated within a largersystem. Specifically, the computing device 750 may include processingcircuitry comprising one or both of a network processing unit 752 (e.g.,an IPU or DPU, as discussed above) and a compute processing unit 754(e.g., a CPU).

The network processing unit 752 may provide a networked specializedprocessing unit such as an IPU, DPU, network processing unit (NPU), orother “xPU” outside of the central processing unit (CPU). The processingunit may be embodied as a standalone circuit or circuit package,integrated within an SoC, integrated with networking circuitry (e.g., ina SmartNIC), or integrated with acceleration circuitry, storage devices,or AI or specialized hardware, consistent with the examples above.

The compute processing unit 754 may provide a processor as a centralprocessing unit (CPU) microprocessor, multi-core processor,multithreaded processor, an ultra-low voltage processor, an embeddedprocessor, or other forms of a special purpose processing unit orspecialized processing unit for compute operations.

Either the network processing unit 752 or the compute processing unit754 may be a part of a system on a chip (SoC) which includes componentsformed into a single integrated circuit or a single package. The networkprocessing unit 752 or the compute processing unit 754 and accompanyingcircuitry may be provided in a single socket form factor, multiplesocket form factor, or a variety of other formats.

The processing units 752, 754 may communicate with a system memory 756(e.g., random access memory (RAM)) over an interconnect 755 (e.g., abus). In an example, the system memory 756 may be embodied as volatile(e.g., dynamic random access memory (DRAM), etc.) memory. Any number ofmemory devices may be used to provide for a given amount of systemmemory. A storage 758 may also couple to the processor 752 via theinterconnect 755 to provide for persistent storage of information suchas data, applications, operating systems, and so forth. In an example,the storage 758 may be implemented as non-volatile storage such as asolid-state disk drive (SSD).

The components may communicate over the interconnect 755. Theinterconnect 755 may include any number of technologies, includingindustry-standard architecture (ISA), extended ISA (EISA), peripheralcomponent interconnect (PCI), peripheral component interconnect extended(PCIx), PCI express (PCIe), Compute Express Link (CXL), or any number ofother technologies. The interconnect 755 may couple the processing units752, 754 to a transceiver 766, for communications with connected edgedevices 762.

The transceiver 766 may use any number of frequencies and protocols. Forexample, a wireless local area network (WLAN) unit may implement Wi-Fi®communications in accordance with the Institute of Electrical andElectronics Engineers (IEEE) 802.11 standard, or a wireless wide areanetwork (WWAN) unit may implement wireless wide area communicationsaccording to a cellular, mobile network, or other wireless wide areaprotocol. The wireless network transceiver 766 (or multipletransceivers) may communicate using multiple standards or radios forcommunications at a different range. A wireless network transceiver 766(e.g., a radio transceiver) may be included to communicate with devicesor services in the edge cloud 110 or the cloud 130 via local or widearea network protocols.

The communication circuitry (e.g., transceiver 766, network interface768, external interface 770, etc.) may be configured to use any one ormore communication technology (e.g., wired or wireless communications)and associated protocols (e.g., a cellular networking protocol such a3GPP 4G or 5G standard, a wireless local area network protocol such asIEEE 802.11/Wi-Fi®, a wireless wide area network protocol, Ethernet,Bluetooth®, Bluetooth Low Energy, an IoT protocol such as IEEE 802.15.4or ZigBee®, Matter®, low-power wide-area network (LPWAN) or low-powerwide-area (LPWA) protocols, etc.) to effect such communication. Giventhe variety of types of applicable communications from the device toanother component or network, applicable communications circuitry usedby the device may include or be embodied by any one or more ofcomponents 766, 768, or 770. Accordingly, in various examples,applicable means for communicating (e.g., receiving, transmitting, etc.)may be embodied by such communications circuitry.

The computing device 750 may include or be coupled to accelerationcircuitry 764, which may be embodied by one or more AI accelerators, aneural compute stick, neuromorphic hardware, an FPGA, an arrangement ofGPUs, one or more SoCs, one or more CPUs, one or more digital signalprocessors, dedicated ASICs, or other forms of specialized processors orcircuitry designed to accomplish one or more specialized tasks. Thesetasks may include AI processing (including machine learning, training,inferencing, and classification operations), visual data processing,network data processing, object detection, rule analysis, or the like.Accordingly, in various examples, applicable means for acceleration maybe embodied by such acceleration circuitry.

The interconnect 755 may couple the processing units 752, 754 to asensor hub or external interface 770 that is used to connect additionaldevices or subsystems. The devices may include sensors 772, such asaccelerometers, level sensors, flow sensors, optical light sensors,camera sensors, temperature sensors, global navigation system (e.g.,GPS) sensors, pressure sensors, pressure sensors, and the like. The hubor interface 770 further may be used to connect the edge computing node750 to actuators 774, such as power switches, valve actuators, anaudible sound generator, a visual warning device, and the like.

In some optional examples, various input/output (I/O) devices may bepresent within or connected to, the edge computing node 750. Forexample, a display or other output device 784 may be included to showinformation, such as sensor readings or actuator position. An inputdevice 786, such as a touch screen or keypad may be included to acceptinput. An output device 784 may include any number of forms of audio orvisual display, including simple visual outputs such as LEDs or morecomplex outputs such as display screens (e.g., LCD screens), with theoutput of characters, graphics, multimedia objects, and the like beinggenerated or produced from the operation of the edge computing node 750.

A battery 776 may power the edge computing node 750, although, inexamples in which the edge computing node 750 is mounted in a fixedlocation, it may have a power supply coupled to an electrical grid, orthe battery may be used as a backup or for temporary capabilities. Abattery monitor/charger 778 may be included in the edge computing node750 to track the state of charge (SoCh) of the battery 776. The batterymonitor/charger 778 may be used to monitor other parameters of thebattery 776 to provide failure predictions, such as the state of health(SoH) and the state of function (SoF) of the battery 776. A power block780, or other power supply coupled to a grid, may be coupled with thebattery monitor/charger 778 to charge the battery 776.

In an example, the instructions 782 on the processing units 752, 754(separately, or in combination with the instructions 782 of themachine-readable medium 760) may configure execution or operation of atrusted execution environment (TEE) 790. In an example, the TEE 790operates as a protected area accessible to the processing units 752, 754for secure execution of instructions and secure access to data. Otheraspects of security hardening, hardware roots-of-trust, and trusted orprotected operations may be implemented in the edge computing node 750through the TEE 790 and the processing units 752, 754.

The computing device 750 may be a server, appliance computing devices,and/or any other type of computing device with the various form factorsdiscussed above. For example, the computing device 750 may be providedby an appliance computing device that is a self-contained electronicdevice including a housing, a chassis, a case, or a shell.

In an example, the instructions 782 provided via the memory 756, thestorage 758, or the processing units 752, 754 may be embodied as anon-transitory, machine-readable medium 760 including code to direct theprocessor 752 to perform electronic operations in the edge computingnode 750. The processing units 752, 754 may access the non-transitory,machine-readable medium 760 over the interconnect 755. For instance, thenon-transitory, machine-readable medium 760 may be embodied by devicesdescribed for the storage 758 or may include specific storage units suchas optical disks, flash drives, or any number of other hardware devices.The non-transitory, machine-readable medium 760 may include instructionsto direct the processing units 752, 754 to perform a specific sequenceor flow of actions, for example, as described with respect to theflowchart(s) and block diagram(s) of operations and functionalitydiscussed herein. As used herein, the terms “machine-readable medium”,“machine-readable storage”, “computer-readable storage”, and“computer-readable medium” are interchangeable.

In further examples, a machine-readable medium also includes anytangible medium that is capable of storing, encoding, or carryinginstructions for execution by a machine and that cause the machine toperform any one or more of the methodologies of the present disclosureor that is capable of storing, encoding or carrying data structuresutilized by or associated with such instructions. A “machine-readablemedium” thus may include but is not limited to, solid-state memories,and optical and magnetic media. The instructions embodied by amachine-readable medium may further be transmitted or received over acommunications network using a transmission medium via a networkinterface device utilizing any one of a number of transfer protocols(e.g., HTTP).

A machine-readable medium may be provided by a storage device or otherapparatus which is capable of hosting data in a non-transitory format.In an example, information stored or otherwise provided on amachine-readable medium may be representative of instructions, such asinstructions themselves or a format from which the instructions may bederived. This format from which the instructions may be derived mayinclude source code, encoded instructions (e.g., in compressed orencrypted form), packaged instructions (e.g., split into multiplepackages), or the like. The information representative of theinstructions in the machine-readable medium may be processed byprocessing circuitry into the instructions to implement any of theoperations discussed herein. For example, deriving the instructions fromthe information (e.g., processing by the processing circuitry) mayinclude: compiling (e.g., from source code, object code, etc.),interpreting, loading, organizing (e.g., dynamically or staticallylinking), encoding, decoding, encrypting, unencrypting, packaging,unpackaging, or otherwise manipulating the information into theinstructions.

In an example, the derivation of the instructions may include assembly,compilation, or interpretation of the information (e.g., by theprocessing circuitry) to create the instructions from some intermediateor preprocessed format provided by the machine-readable medium. Theinformation, when provided in multiple parts, may be combined, unpacked,and modified to create the instructions. For example, the informationmay be in multiple compressed source code packages (or object code, orbinary executable code, etc.) on one or several remote servers.

In further examples, a software distribution platform (e.g., one or moreservers and one or more storage devices) may be used to distributesoftware, such as the example instructions discussed above, to one ormore devices, such as example processor platform(s) and/or exampleconnected edge devices noted above. The example software distributionplatform may be implemented by any computer server, data facility, cloudservice, etc., capable of storing and transmitting software to othercomputing devices. In some examples, the providing entity is adeveloper, a seller, and/or a licensor of software, and the receivingentity may be consumers, users, retailers, OEMs, etc., that purchaseand/or license the software for use and/or re-sale and/or sub-licensing.

In some examples, the instructions are stored on storage devices of thesoftware distribution platform in a particular format. A format ofcomputer readable instructions includes, but is not limited to aparticular code language (e.g., Java, JavaScript, Python, C, C#, SQL,HTML, etc.), and/or a particular code state (e.g., uncompiled code(e.g., ASCII), interpreted code, linked code, executable code (e.g., abinary), etc.). In some examples, the computer readable instructionsstored in the software distribution platform are in a first format whentransmitted to an example processor platform(s). In some examples, thefirst format is an executable binary in which particular types of theprocessor platform(s) can execute. However, in some examples, the firstformat is uncompiled code that requires one or more preparation tasks totransform the first format to a second format to enable execution on theexample processor platform(s). For instance, the receiving processorplatform(s) may need to compile the computer readable instructions inthe first format to generate executable code in a second format that iscapable of being executed on the processor platform(s). In still otherexamples, the first format is interpreted code that, upon reaching theprocessor platform(s), is interpreted by an interpreter to facilitateexecution of instructions.

Memory Interleaving on Edge Computing Systems

In a variety of edge computing settings, there are variations in loadfor network traffic and usage of computing resources. For example,consider an edge computing network that deploys compute resources atrespective base stations to process workloads, where each of therespective base stations are connected to different and varying numbersof client devices at different times, to perform varying types andamounts of processing operations. Many edge computing deploymentsattempt to handle this variation in traffic and compute usage by theorganization and use of disaggregated resources situated at basestations and central offices. For example, memory and compute resourcesmay be pooled among multiple locations to effectively handle workloadamong shared resources.

FIG. 8 depicts a scenario of edge computing operations coordinated amonga user layer 810, an edge layer 820, and a cloud layer 830. Consistentwith the examples discussed above (e.g., with reference to FIGS. 1 to 3), edge computing operations may be performed at the edge layer 820based on requests from client devices of a heterogenous network 812, avehicular network 814, or a machine-to-machine (M2M) or device-to-device(D2D) network 816. The edge layer 820 may further invoke functionsresident in a cloud layer 830 to perform further data processing or dataretrieval at a remote data center.

The disaggregated resources available in the edge layer 820 includecommunication resources 822, computing resources 824, caching resources826, and the like, provided among a variety of devices or nodes. Theresources 822, 824, and 826 may be arranged into compute pools, memorypools, etc., as shown by memory pooling 840, which represents a virtualpool of memory comprised of portions of memory devices existing amongmultiple physical systems. In some examples, a first set of resourcesmay be tunneled via an interconnect protocol (e.g., CXL) to a second setof resources, such as the connection of accelerators to memory pools toenable the execution of compute operations on memory regions at thesoftware level.

Within existing systems in the edge layer 820, there is a lack ofcapability to dynamically carve out memory regions across memoryresources that are distributed. As a result, pooled resources ofexisting edge compute systems are unable to dynamically adapt the pooledresources based on bandwidth and other requirements. Further, withexisting approaches, the memory pooling 840 that is established betweendifferent systems (e.g., between base stations) is fixed and cannot beeasily reconfigured. There is no capability with existing poolingapproaches to dynamically attach or adapt resources, such as mappingacceleration capability to memory pooling on the fly.

Carving out memory regions for dynamic memory pooling and memory poolingadaptation results in a number of new capabilities. For instance,existing approaches for memory pooling do not consider an estimation ofcurrent and future edge traffic at each base station, and correspondingpressure on various memory pools, proximity to various accelerators andtheir load. The present techniques for dynamic memory pool interleavingcan consider these requirements, in addition to other aspects for memorypool performance such as granularity of interleaving, priority andredundancy/replication requirements, and the like. Further, the presenttechniques, especially when deployed at multiple base stations, can alsoconsider sufficient proximity and network latency bottlenecks so thatmemory pool interleaving can be successfully performed without servicedegradation.

In the following, dynamic memory and resource pooling approaches areintroduced that better adapt to real-world scenarios and service usage.In particular, a number of capabilities are introduced into a networkswitch to access and control a pooled architecture of memory resources(or any other resource pooling with similar characteristics). Thesecapabilities include the introduction of transparently smartinterleaving methods that are network aware. These capabilities alsoinclude pooling mechanisms that may be operated with bandwidthaugmentation. This is accomplished with the use of estimation andprediction logic operating at the network switch, to evaluate resourcetelemetry and dynamically identify resource needs.

Various examples of memory pool interleaving are provided, but it willbe understood that the present techniques for resource pooling can becombined with other types of interleaving and memory pool usemanagement. As used herein, memory pool interleaving refers to thedispersing of memory storage and access among disaggregated, networkedphysical memory resources and locations (e.g., among network-connecteddifferent nodes or devices). The memory pool interleaving is anarrangement (e.g., configuration, scheme, approach) that is organizedand performed in a coordinated fashion to reduce latency for overall useof the memory pool. For instance, in scenarios where available bandwidthpresents a bottleneck for a particular memory location, then othermemory resources of the pool are deployed for use in the pool.

It will be understood that memory pool interleaving, as used herein, isperformed at a resource or system level, and is generallydistinguishable from memory address interleaving that is commonlyperformed by a memory controller on individual memory banks within amemory module. Thus, individual systems in a distributed memory pool mayuse memory address interleaving within their memory modules. Further,memory pool interleaving may involve interleaving of data chunks,blocks, or sets that are much larger than conventional memory moduleinterleaving.

The use of memory pool interleaving among multiple systems enables aunique capability to shape load for edge computing operations.Additionally, memory pool interleaving provides the capability to weavetogether memory hierarchies on the fly, attaching accelerators toperform operations on data in the memory. This also enables an abilityto deploy accelerators to varying edge loads, especially for basestations.

FIG. 9 depicts a network architecture 900 for establishing memorypooling among compute resources 912 and memory resources 914 ofrespective base stations 910. In this scenario, base station 910Aincludes compute resources 912A and memory resources 914A, base station910B includes compute resources 912B and memory resources 914B, basestation 910C includes compute resources 912C and memory resources 914C,and base station 910D includes compute resources 912D and memoryresources 914D. Each base station 910A-D is accessible via one or moreinterconnect or network (not shown), and each base station 910A-Doperates as an independent platform that can provide one or moredisaggregated resources for pooling. As used herein, a “disaggregated”resource generally refers to a “unassociated”, “unpooled”, “distinct”,“unshared” resource, and consistent with its usage as a term of art,does not necessarily mean that the resource was aggregated at one pointand then later separated.

In an example, a software stack on each platform (e.g., implemented inan operating system or network stack, or both) exposes an interface(e.g., application programming interface (API)) for identifying andconfiguring a pooled memory allocation. This interface can receiverequests that identify a level of memory bandwidth, to enable aspecification of the amount of bandwidth required for a memory chunkbeing allocated in the memory pool. For instance, multiple categories ofmemory bandwidth may be provided for use in the architecture, such asthree types corresponding to High, Medium, Low (as a non-limitingexample, High corresponding to over 1 Gbps, Medium corresponding tobetween 100-999 Mbps, Low corresponding to under 100 Mbps). Each memorytype may be mapped to a different global address memory space.Additionally, the interface may enable an interleaving capability to beturned on and off for all or some of the pooled memory allocation. Theinterleaving capability can be configured with an explicit commandinvoked from the interface (e.g., to set interleaving on or off), or theinterleaving capability can be implicitly configured based on adefinition of the memory type. Various data may be used to save thestate of the pooled memory allocation and which memory resources areavailable, and such data may be updated if the state of the memoryresources changes.

The present approaches thus enable use of memory pooling that cancontrol which portions of a memory resource or memory pool should (orshould not) be interleaved among different memory locations, and whichtypes of memory uses can or cannot be interleaved among different memorylocations. In contrast, existing memory interleaving implementations aredesigned to enable interleaving at all regions of a single memorylocation—or at best, only provide a very limited number of memoryregions in the location with interleaving disabled.

At a network switch 920, control of interleaving for a memory pool maybe implemented by use of interleave logic 928. In an example, theinterleave logic 928 may be implemented by use of a global sourceaddress decoder. For instance, the interleave logic 928 can be used todynamically allocate the requested memory in the pooled memory region930 from one of the interleaved memory spaces (e.g., a pool constructedfrom memory areas with memory storage interleaved among multiple memoryresources 914A, 914B, 914C, 914D), or from one of the non-interleavedmemory spaces (e.g., a pool constructed by directly storing to only oneof the memory resources (914A, 914B, 914C, 914D). Accordingly, thepooled memory region 930 may include interleaved and non-interleavedmemory storage among the various platforms.

The switch 920 may also configure the pooled memory region 930 tosupport use cases with in-memory compute capabilities. For example, theuse cases may provide hints to use a non-interleaved mode to enablecompute on the entire data as opposed to only a chunk of the interleaveddata. Non-interleaved allocation may also be necessary for specificdevices (DRAM, NVM, or others) that may need to be managed forresiliency through hot plugging or hot unplugging in order to serviceinfrastructures without interfering with execution of distributedworkloads.

In a further example, the switch 920 also implements an interface andadditional logic for dynamic configuration and re-configuration ofinterleaved memory pooling. First, the logic implemented in the switch920 can include bandwidth estimator and predictor logic 922. Thebandwidth estimator and predictor logic 922 is used to select thenumbers of end pooled memory servers needed for achieving the requiredlevel of memory bandwidth. The bandwidth estimator and predictor logic922 selects and potentially reserves the fabric resources (e.g.,specific memory bandwidth of a virtual channel) for connectivity tothose servers. In a similar manner, an acceleration requirementestimator logic 924 can identify the usage of acceleration resources forin-memory processing; an incoming load requirement predictor logic 926can provide a prediction of workload usage and needed resources forworkload processing with use of the compute and memory resources. Eachof these logic may also consider priority, tiering, and classification.Each of these logic may also discover or identify aspects of thedisaggregated memory resources and update relevant data structures ordatabases about the disaggregated memory resources.

Next, the switch 920 implements the interleave logic 928. The switch 920negotiates with the end memory pools to allocate the required memorychunks that will be interleaved in the pooled memory region 930. As canbe understood, memory pool interleaving also includes some latencyconsiderations, and can use different interleaving sizes to achievedifferent pooling properties. Further, interleaving also includesresiliency and infrastructure service management considerations.

The switch 920 may also use other logic (not depicted) to process readsand writes for a particular memory range. The logic is responsible forcreating the corresponding unicast or multicast messages to split orgather all the required data and respond with one single response to theoriginator. For example, the interleave logic 928 may consider scenarioswhere the fabric resources are scarce and not enough to suffice aparticular request without stealing temporary bandwidth from the besteffort address ranges. The interleave logic 928 may also determine howpools from different groups can be mapped into the same interleavingtype to apply load balancing schemes.

The interfaces in the respective platforms (e.g., 910A-910D) providesupport for exposing (e.g., identifying, discovering) tiering withinpools where interleaving is combined with an upper tier of memory ateach pool that is deeply interleaved but where the capacity exposed by alower tier is not deeply interleaved or not interleaved. Theseinterfaces enable the infrastructure to support caching of popular orstreaming data that flows from lower tiers. In some embodiments, whendata cannot be split across servers (i.e., cannot be pooled), suchinterfaces provide an ability to achieve aggregate bandwidth and lowlatency of the highly interleaved upper tier memory and the highcapacity of lower tiered memory/storage across the same interfaces.Further, such interfaces may be extended to enable transparent use ofprocessing-in-memory/processing-in-storage through per-node accelerationlogics.

The use of memory regions and configurations may be based on the use oftiers as noted above. It will be understood that additionalin-memory-computing or in-storage-computing can be supported in outer(high capacity) tiers and the computed results can be supplied frominner tiers. At the same time, in-pool-compute can be supported with lowpower CPUs/XPUs for branch-based compute operations (sort, filter, etc.)at the upper tier with lower tier providing bulk in-pool-compute (scan,encrypt, reduce, split, merge) operations. Such operations may beenabled or coordinated through the use of acceleration capabilities orcapabilities already built into memory technology devices.

The implementation of the memory pooling and memory pool interleavingmay be enabled with use of a variety of distributed network processingunits. For instance, an implementation may include the use of a set ofdistributed IPUs as coordinated according to the architectures discussedwith reference to FIGS. 4 to 6 , above, particularly in large networkscenarios that include independent nodes. For instance, distributed IPUsmay be used to evaluate key performance indicators (KPIs) and effects onquality of service (QoS) when deciding what tiers to use for memorypooling operations, or whether to combine multiple memory pools into alarger memory pool. Logic can also be implemented at IPUs amongdifferent base stations (platforms) or other entities, or at IPUsconnected in a mesh network, to accomplish the logic operationsdiscussed above. Data structures, databases or data stores, and othermappings may be used to track the state of the disaggregated memoryresources and which resources are or are not part of the memory pool.

FIG. 10 depicts a scenario in network architecture 1000, which deploysinterleaving in a memory pool based on coordination of networkedprocessing units (IPUs). In this scenario, consider an example whereIPUs are used for managing or accessing resources at a set of basestations (e.g., base stations 910A-910D, providing compute resources912A-912D and memory resources 914A-914D as discussed above).

IPUs enable connectivity and memory pooling among multiple networktopologies, because data is synchronized through network connections toindividual IPUs. Likewise, IPUs may implement logic to enable anothertier for pooling (e.g. pool of pools). Accordingly, an IPU may operateat each platform/base station (e.g., IPU 1010A at base station 910A, IPU1010B at base station 910B, IPU 1010C at base station 910C, IPU 1010D atbase station 910D).

In the scenario of FIG. 10 , the switch 920 optionally includes an IPU1020 to coordinate the logic operations for pooling (e.g., logic 922,924, 926, 928, discussed above). In other examples, one or more IPUsoutside of the switch 920 (e.g., coordinated by a distributed IPU meshnetwork 1030) can perform the memory pooling functions instead of theswitch 920. In still other examples, one or more IPUs or the meshnetwork 1030 may combine operations with one or more switches to performdistributed management of the memory pool.

In further examples, IPUs may coordinate to identify, discover, and mapdifferent memory pools or memory pool configurations into differentinterleaving types or resource groupings. A variety of discovery or dataprocessing mechanisms may be used to perform this mapping and to retainor store data that maps the disaggregated memory resources at respectivecompute locations. Additionally, the IPUs may also be responsible foraccess or coordination of other resources, including but not limited toin-memory computing, accelerated processing, or low power operations(e.g., filter, sort, etc., or encryption/decryption operations).

FIG. 11 depicts a flowchart 1100 of a method for configuringinterleaving in a memory pool established in an edge computingenvironment. The method 1100 may be implemented by one or more networkedprocessing units or other forms of processing circuitry, andinstructions embodied thereon to be executed by the networked processingunit(s) (or processing circuitry), consistent with the examples andfunctionality of networked processing units (e.g., IPUs), as discussedabove.

At 1110, operations are performed to identify disaggregated memoryresources at respective compute locations. In an example, the respectivecompute locations are connected to each another via at least oneinterconnect. For instance, the respective compute locations maycorrespond to processing hardware at respective base stations, as theclient devices connect to the network via one or more of the respectivebase stations. Also for instance, the one or more of the respectivecompute locations may include acceleration resources, as thedisaggregated memory resources are mapped to the acceleration resources(e.g., with disaggregated memory resources that are connected to theacceleration resources via a Compute Express Link (CXL) interconnect).

At 1120, operations are performed to identify workload requirements foruse of the compute locations by respective workloads. In an example, theworkloads are provided by client devices to the compute locations via anetwork. In various examples, the workload requirements are identifiedbased on one or more of: a latency measurement for use of computeresources at the respective compute locations; an estimation of anavailability of acceleration resources for current workloads in thenetwork; a prediction of an availability of acceleration resources forfuture workloads in the network; a latency measurement forcommunications in the network; an estimation of current traffic in thenetwork; or a prediction of bandwidth or load requirements in thenetwork.

At 1130, operations are performed to determine an interleavingarrangement (e.g., configuration, scheme, or overlay) for a distributedmemory pool that fulfills the workload requirements. In an example, theinterleaving arrangement is provided in a virtual memory storage poolthat is to distribute data for the respective workloads among thedisaggregated memory resources at the respective compute locations. In afurther example, this determination is based on categorizing memorybandwidth available at the disaggregated memory resources into multiplecategories, to enable the interleaving arrangement to be determinedusing the multiple categories.

At 1140, operations are performed (e.g., via commands, requests, orother operations) to configure (i.e., enable) the memory pool for use bythe client devices of the network. This configuration of the memory poolcauses the disaggregated memory resources to host data based on theinterleaving arrangement. In some examples, a portion of thedisaggregated memory resources at one or more compute locations areestablished without interleaving (e.g., determined to not utilizeinterleaving) based on the workload requirements.

At 1150, operations are performed to conduct memory storage andretrieval operations from the disaggregated memory resources, using thememory pool. For instance, this may include storing data in the memorypool (e.g., to multiple compute locations) according to the interleavingarrangement, and retrieving data in the memory pool (e.g., from multiplecompute locations) according to the interleaving arrangement.

At 1160, additional operations are performed to determine an updatedinterleaving arrangement for the memory pool, such as based on changedworkload requirements or network conditions. At 1170, the memory pool isreconfigured to provide memory resources with use of the updatedinterleaving arrangement.

In further examples, the method of flowchart 1100 is performed by anetwork switch, and the method also includes (e.g., in connection with1150) processing requests, at the network switch, for the use of thememory pool by the client devices of the network. Also in furtherexamples, the method of flowchart 1100 is performed by a networkedprocessing unit, and the method also includes implementing, at thenetworked processing unit (and, by networked processing unitoperations), the interleaving arrangement among the disaggregated memoryresources by causing the configuration of respective networkedprocessing units at the respective compute locations.

Additional Examples

Additional examples of the presently described method, system, anddevice embodiments include the following, non-limiting implementations.Each of the following non-limiting examples may stand on its own or maybe combined in any permutation or combination with any one or more ofthe other examples provided below or throughout the present disclosure.

Example 1 is a method for configuring interleaving in a memory poolestablished in an edge computing arrangement, comprising: identifying(e.g., mapping, discovering, retrieving, or receiving information for)disaggregated memory resources at respective compute locations, thecompute locations connected to each another via at least oneinterconnect; identifying (e.g., retrieving, generating, accessing)workload requirements for use of the compute locations by respectiveworkloads, the workloads provided by client devices to the computelocations via a network; determining an interleaving arrangement for amemory pool that fulfills the workload requirements, the interleavingarrangement to distribute data for the respective workloads among thedisaggregated memory resources at the respective compute locations; andconfiguring the memory pool (or, causing the memory pool to beconfigured) for use by the client devices of the network, the memorypool to cause the disaggregated memory resources among the computelocations to host data based on the interleaving arrangement.

In Example 2, the subject matter of Example 1 optionally includessubject matter where the method is performed by a network switch, andwherein the method further comprises: processing requests, at thenetwork switch, for the use of the memory pool by the client devices ofthe network.

In Example 3, the subject matter of any one or more of Examples 1-2optionally include subject matter where the method is performed by anetworked processing unit, and wherein the method further comprises:implementing, at the networked processing unit, the interleavingarrangement among the disaggregated memory resources by configuration ofrespective networked processing units at the respective computelocations.

In Example 4, the subject matter of any one or more of Examples 1-3optionally include subject matter where the workload requirements areidentified based on one or more of: a latency measurement for use ofcompute resources at the respective compute locations; an estimation ofan availability of acceleration resources for current workloads in thenetwork; a prediction of an availability of acceleration resources forfuture workloads in the network; a latency measurement forcommunications in the network; an estimation of current traffic in thenetwork; or a prediction of bandwidth or load requirements in thenetwork.

In Example 5, the subject matter of any one or more of Examples 1-4optionally include subject matter where the respective compute locationscorrespond to processing hardware at respective base stations, andwherein the client devices connect to the network via one or more of therespective base stations.

In Example 6, the subject matter of any one or more of Examples 1-5optionally include subject matter where one or more of the respectivecompute locations include acceleration resources, and wherein thedisaggregated memory resources are mapped to the acceleration resources.

In Example 7, the subject matter of Example 6 optionally includessubject matter where the disaggregated memory resources are connected tothe acceleration resources via a Compute Express Link (CXL)interconnect.

In Example 8, the subject matter of any one or more of Examples 1-7optionally include categorizing memory bandwidth available at thedisaggregated memory resources into multiple categories; wherein theinterleaving arrangement is determined using the multiple categories.

In Example 9, the subject matter of any one or more of Examples 1-8optionally include allocating a portion of the disaggregated memoryresources at one or more compute locations without interleaving based onthe workload requirements.

In Example 10, the subject matter of any one or more of Examples 1-9optionally include storing data in the memory pool according to theinterleaving arrangement; and retrieving data in the memory poolaccording to the interleaving arrangement.

In Example 11, the subject matter of any one or more of Examples 1-10optionally include determining an updated interleaving arrangement; andreconfiguring the memory pool for use by the client devices, based onthe updated interleaving arrangement.

Example 12 is a device, comprising: a networked processing unit; and astorage medium including instructions embodied thereon, wherein theinstructions, which when executed by the networked processing unit,configure the networked processing unit to: identify (e.g., map,discover, retrieve, or receive information for) disaggregated memoryresources at respective compute locations, the compute locationsconnected to each another via at least one interconnect; identify (e.g.,retrieve, generate, access) workload requirements for use of the computelocations by respective workloads, the workloads provided by clientdevices to the compute locations via a network; determine aninterleaving arrangement for a memory pool that fulfills the workloadrequirements, the interleaving arrangement to distribute data for therespective workloads among the disaggregated memory resources at therespective compute locations; and configure the memory pool (or, causethe memory pool to be configured) for use by the client devices of thenetwork, the memory pool to cause the disaggregated memory resourcesamong the compute locations to host data based on the interleavingarrangement.

In Example 13, the subject matter of Example 12 optionally includessubject matter where the device is a network switch, and wherein theinstructions further configure the networked processing unit to: processrequests, at the network switch, for the use of the memory pool by theclient devices of the network.

In Example 14, the subject matter of any one or more of Examples 12-13optionally include subject matter where the instructions furtherconfigure the networked processing unit to: provide commands torespective networked processing units at the respective computelocations, to cause the respective networked processing units toimplement the interleaving arrangement among the disaggregated memoryresources.

In Example 15, the subject matter of any one or more of Examples 12-14optionally include subject matter where the workload requirements areidentified based on one or more of: a latency measurement for use ofcompute resources at the respective compute locations; an estimation ofan availability of acceleration resources for current workloads in thenetwork; a prediction of an availability of acceleration resources forfuture workloads in the network; a latency measurement forcommunications in the network; an estimation of current traffic in thenetwork; or a prediction of bandwidth or load requirements in thenetwork.

In Example 16, the subject matter of any one or more of Examples 12-15optionally include subject matter where the respective compute locationscorrespond to processing hardware at respective base stations, andwherein the client devices connect to the network via one or more of therespective base stations.

In Example 17, the subject matter of any one or more of Examples 12-16optionally include subject matter where one or more of the respectivecompute locations include acceleration resources, and wherein thedisaggregated memory resources are mapped to the acceleration resources.

In Example 18, the subject matter of Example 17 optionally includessubject matter where the disaggregated memory resources are connected tothe acceleration resources via a Compute Express Link (CXL)interconnect.

In Example 19, the subject matter of any one or more of Examples 12-18optionally include subject matter where the instructions furtherconfigure the networked processing unit to: categorize memory bandwidthavailable at the disaggregated memory resources into multiplecategories; wherein the interleaving arrangement is determined using themultiple categories.

In Example 20, the subject matter of any one or more of Examples 12-19optionally include subject matter where the instructions furtherconfigure the networked processing unit to: allocate a portion of thedisaggregated memory resources at one or more compute locations withoutinterleaving based on the workload requirements.

In Example 21, the subject matter of any one or more of Examples 12-20optionally include subject matter where the instructions furtherconfigure the networked processing unit to: store data in the memorypool according to the interleaving arrangement; and retrieve data in thememory pool according to the interleaving arrangement.

In Example 22, the subject matter of any one or more of Examples 12-21optionally include subject matter where the instructions furtherconfigure the networked processing unit to: determine an updatedinterleaving arrangement; and reconfigure the memory pool for use by theclient devices, based on the updated interleaving arrangement.

Example 23 is a machine-readable medium (e.g., a non-transitory storagemedium) comprising information (e.g., data) representative ofinstructions, wherein the instructions, when executed by processingcircuitry, cause the processing circuitry to perform, implement, ordeploy any of Examples 1-22.

Example 24 is an apparatus of an edge computing system comprising meansto implement any of Examples 1-23, or other subject matter describedherein.

Example 25 is an apparatus of an edge computing system comprising logic,modules, circuitry, or other means to implement any of Examples 1-23, orother subject matter described herein.

Example 26 is a networked processing unit (e.g., an infrastructureprocessing unit as discussed here) or system including a networkedprocessing unit, configured to implement any of Examples 1-23, or othersubject matter described herein.

Example 27 is an edge computing system, including respective edgeprocessing devices and nodes to invoke or perform any of the operationsof Examples 1-23, or other subject matter described herein.

Example 28 is an edge computing system including aspects of networkfunctions, acceleration functions, acceleration hardware, storagehardware, or computation hardware resources, operable to invoke orperform the use cases discussed herein, with use of any Examples 1-23,or other subject matter described herein.

Example 29 is a system to implement any of Examples 1-28.

Example 30 is a method to implement any of Examples 1-28.

Although these implementations have been described concerning specificexemplary aspects, it will be evident that various modifications andchanges may be made to these aspects without departing from the broaderscope of the present disclosure. Many of the arrangements and processesdescribed herein can be used in combination or in parallelimplementations that involve terrestrial network connectivity (whereavailable) to increase network bandwidth/throughput and to supportadditional edge services. Accordingly, the specification and drawingsare to be regarded in an illustrative rather than a restrictive sense.The accompanying drawings that form a part hereof show, by way ofillustration, and not of limitation, specific aspects in which thesubject matter may be practiced. The aspects illustrated are describedin sufficient detail to enable those skilled in the art to practice theteachings disclosed herein. Other aspects may be utilized and derivedtherefrom, such that structural and logical substitutions and changesmay be made without departing from the scope of this disclosure. ThisDetailed Description, therefore, is not to be taken in a limiting sense,and the scope of various aspects is defined only by the appended claims,along with the full range of equivalents to which such claims areentitled.

Such aspects of the inventive subject matter may be referred to herein,individually and/or collectively, merely for convenience and withoutintending to voluntarily limit the scope of this application to anysingle aspect or inventive concept if more than one is disclosed. Thus,although specific aspects have been illustrated and described herein, itshould be appreciated that any arrangement calculated to achieve thesame purpose may be substituted for the specific aspects shown. Thisdisclosure is intended to cover any adaptations or variations of variousaspects. Combinations of the above aspects and other aspects notspecifically described herein will be apparent to those of skill in theart upon reviewing the above description.

What is claimed is:
 1. A method for configuring interleaving in a memorypool established in an edge computing arrangement, comprising:discovering disaggregated memory resources at respective computelocations, the compute locations connected to each another via at leastone interconnect; identifying workload requirements for use of thecompute locations by respective workloads, the workloads provided byclient devices to the compute locations via a network; determining aninterleaving arrangement for a memory pool that fulfills the workloadrequirements, the interleaving arrangement to distribute data for therespective workloads among the disaggregated memory resources at therespective compute locations; and configuring the memory pool for use bythe client devices of the network, the memory pool to cause thedisaggregated memory resources among the compute locations to host databased on the interleaving arrangement.
 2. The method of claim 1, whereinthe method is performed by a network switch, and wherein the methodfurther comprises: processing requests, at the network switch, for theuse of the memory pool by the client devices of the network.
 3. Themethod of claim 1, wherein the method is performed by a networkedprocessing unit, and wherein the method further comprises: implementing,at the networked processing unit, the interleaving arrangement among thedisaggregated memory resources by configuration of respective networkedprocessing units at the respective compute locations.
 4. The method ofclaim 1, wherein the workload requirements are identified based on oneor more of: a latency measurement for use of compute resources at therespective compute locations; an estimation of an availability ofacceleration resources for current workloads in the network; aprediction of an availability of acceleration resources for futureworkloads in the network; a latency measurement for communications inthe network; an estimation of current traffic in the network; or aprediction of bandwidth or load requirements in the network.
 5. Themethod of claim 1, wherein the respective compute locations correspondto processing hardware at respective base stations, and wherein theclient devices connect to the network via one or more of the respectivebase stations.
 6. The method of claim 1, wherein one or more of therespective compute locations include acceleration resources, and whereinthe disaggregated memory resources are mapped to the accelerationresources.
 7. The method of claim 6, wherein the disaggregated memoryresources are connected to the acceleration resources via a ComputeExpress Link (CXL) interconnect.
 8. The method of claim 1, furthercomprising: categorizing memory bandwidth available at the disaggregatedmemory resources into multiple categories; wherein the interleavingarrangement is determined using the multiple categories.
 9. The methodof claim 1, further comprising: allocating a portion of thedisaggregated memory resources at one or more compute locations withoutinterleaving based on the workload requirements.
 10. The method of claim1, further comprising: storing data in the memory pool according to theinterleaving arrangement; and retrieving data in the memory poolaccording to the interleaving arrangement.
 11. The method of claim 1,further comprising: determining an updated interleaving arrangement; andreconfiguring the memory pool for use by the client devices, based onthe updated interleaving arrangement.
 12. A device, comprising: anetworked processing unit; and a storage medium including instructionsembodied thereon, wherein the instructions, which when executed by thenetworked processing unit, configure the networked processing unit to:discover disaggregated memory resources at respective compute locations,the compute locations connected to each another via at least oneinterconnect; identify workload requirements for use of the computelocations by respective workloads, the workloads provided by clientdevices to the compute locations via a network; determine aninterleaving arrangement for a memory pool that fulfills the workloadrequirements, the interleaving arrangement to distribute data for therespective workloads among the disaggregated memory resources at therespective compute locations; and configure the memory pool for use bythe client devices of the network, the memory pool to cause thedisaggregated memory resources among the compute locations to host databased on the interleaving arrangement.
 13. The device of claim 12,wherein the device is a network switch, and wherein the instructionsfurther configure the networked processing unit to: process requests, atthe network switch, for the use of the memory pool by the client devicesof the network.
 14. The device of claim 12, wherein the instructionsfurther configure the networked processing unit to: provide commands torespective networked processing units at the respective computelocations, to cause the respective networked processing units toimplement the interleaving arrangement among the disaggregated memoryresources.
 15. The device of claim 12, wherein the workload requirementsare identified based on one or more of: a latency measurement for use ofcompute resources at the respective compute locations; an estimation ofan availability of acceleration resources for current workloads in thenetwork; a prediction of an availability of acceleration resources forfuture workloads in the network; a latency measurement forcommunications in the network; an estimation of current traffic in thenetwork; or a prediction of bandwidth or load requirements in thenetwork.
 16. The device of claim 12, wherein the respective computelocations correspond to processing hardware at respective base stations,and wherein the client devices connect to the network via one or more ofthe respective base stations.
 17. The device of claim 12, wherein one ormore of the respective compute locations include acceleration resources,and wherein the disaggregated memory resources are mapped to theacceleration resources.
 18. The device of claim 17, wherein thedisaggregated memory resources are connected to the accelerationresources via a Compute Express Link (CXL) interconnect.
 19. The deviceof claim 12, wherein the instructions further configure the networkedprocessing unit to: categorize memory bandwidth available at thedisaggregated memory resources into multiple categories; wherein theinterleaving arrangement is determined using the multiple categories.20. The device of claim 12, wherein the instructions further configurethe networked processing unit to: allocate a portion of thedisaggregated memory resources at one or more compute locations withoutinterleaving based on the workload requirements.
 21. The device of claim12, wherein the instructions further configure the networked processingunit to: store data in the memory pool according to the interleavingarrangement; and retrieve data in the memory pool according to theinterleaving arrangement.
 22. The device of claim 12, wherein theinstructions further configure the networked processing unit to:determine an updated interleaving arrangement; and reconfigure thememory pool for use by the client devices, based on the updatedinterleaving arrangement.
 23. A non-transitory machine-readable storagemedium comprising information representative of instructions, whereinthe instructions, when executed by processing circuitry, cause theprocessing circuitry to: select disaggregated memory resources fromrespective compute locations, the compute locations connected to eachanother via at least one interconnect; generate workload requirementsfor use of the compute locations by respective workloads, the workloadsprovided by client devices to the compute locations via a network;determine an interleaving arrangement for a memory pool that fulfillsthe workload requirements, the interleaving arrangement to distributedata for the respective workloads among the disaggregated memoryresources at the respective compute locations; and configure the memorypool for use by the client devices of the network, the memory pool tocause the disaggregated memory resources among the compute locations tohost data based on the interleaving arrangement.
 24. The non-transitorymachine-readable storage medium of claim 23, wherein the workloadrequirements are identified based on one or more of: a latencymeasurement for use of compute resources at the respective computelocations; an estimation of an availability of acceleration resourcesfor current workloads in the network; a prediction of an availability ofacceleration resources for future workloads in the network; a latencymeasurement for communications in the network; an estimation of currenttraffic in the network; or a prediction of bandwidth or loadrequirements in the network.
 25. The non-transitory machine-readablestorage medium of claim 23, wherein the instructions further configurethe processing circuitry to: categorize memory bandwidth available atthe disaggregated memory resources into multiple categories, wherein theinterleaving arrangement is determined using the multiple categories;and allocate a portion of the disaggregated memory resources at one ormore compute locations without interleaving based on the multiplecategories.